AITopics | linearly separable

Collaborating Authors

linearly separable

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

75c45fca2aa416ada062b26cc4fb7641-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 21:01:51 GMT

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Simplicity Bias in 1-Hidden Layer Neural Networks

Neural Information Processing SystemsDec-24-2025, 02:03:33 GMT

Recent works have demonstrated that neural networks exhibit extreme *simplicity bias* (SB). That is, they learn *only the simplest* features to solve a task at hand, even in the presence of other, more robust but more complex features. Due to the lack of a general and rigorous definition of *features*, these works showcase SB on *semi-synthetic* datasets such as Color-MNIST, MNIST-CIFAR where defining features is relatively easier. In this work, we rigorously define as well as thoroughly establish SB for *one hidden layer* neural networks in the infinite width regime. More concretely, (i) we define SB as the network essentially being a function of a low dimensional projection of the inputs (ii) theoretically, we show that when the data is linearly separable, the network primarily depends on only the linearly separable ($1$-dimensional) subspace even in the presence of an arbitrarily large number of other, more complex features which could have led to a significantly more robust classifier, (iii) empirically, we show that models trained on *real* datasets such as Imagenet and Waterbirds-Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets, iv) finally, we present a natural ensemble approach that encourages diversity in models by training successive models on features not used by earlier models, and demonstrate that it yields models that are significantly more robust to Gaussian noise.

1-hidden layer neural network, name change, simplicity bias, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Predtron: A Family of Online Algorithms for General Prediction Problems

Prateek Jain, Nagarajan Natarajan, Ambuj Tewari

Neural Information Processing SystemsOct-2-2025, 08:18:05 GMT

Modern prediction problems arising in multilabel learning and learning to rank pose unique challenges to the classical theory of supervised learning. These problems have large prediction and label spaces of a combinatorial nature and involve sophisticated loss functions. We offer a general framework to derive mistake driven online algorithms and associated loss bounds. The key ingredients in our framework are a general loss function, a general vector space representation of predictions, and a notion of margin with respect to a general norm. Our general algorithm, Predtron, yields the perceptron algorithm and its variants when instan-tiated on classic problems such as binary classification, multiclass classification, ordinal regression, and multilabel classification. For multilabel ranking and subset ranking, we derive novel algorithms, notions of margins, and loss bounds. A simulation study confirms the behavior predicted by our bounds and demonstrates the flexibility of the design choices in our framework.

artificial intelligence, inductive learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Asia > India (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages

Hayashi, Katsuhiko, Kamigaito, Hidetaka

arXiv.org Artificial IntelligenceSep-29-2025

We prove that all standard subregular language classes are linearly separable when represented by their deciding predicates. This establishes finite observability and guarantees learnability with simple linear models. Synthetic experiments confirm perfect separability under noise-free conditions, while real-data experiments on English morphology show that learned features align with well-known linguistic constraints. These results demonstrate that the subregular hierarchy provides a rigorous and interpretable foundation for modeling natural language structure. Our code used in real-data experiments is available at https://github.com/UTokyo-HayashiLab/subregular.

artificial intelligence, machine learning, predicate, (15 more...)

arXiv.org Artificial Intelligence

2509.22598

Country:

North America > United States > California (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.24)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Active Learning of Classifiers with Label and Seed Queries (Supplementary Material)

Neural Information Processing SystemsAug-18-2025, 20:51:13 GMT

A.3 Pseudocode of CPLearn and full proof of Theorem 10 We present CPLearn and prove Theorem 10. There are two main obstacles in implementing this process. Let us now turn to the invariants. As shown in Bressan et al. [2021a], this implies Now let us turn to CPLearn. This proves the second invariant.

artificial intelligence, machine learning, query, (18 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Lombardy > Milan (0.04)
Europe > Austria (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

A Harmonic Space Proofs

Neural Information Processing SystemsAug-16-2025, 00:25:37 GMT

We prove only one direction. Since this is a constant vector, the two nodes are not separable in the diffusion limit.We state the following result without a proof (see Exercise 4.1 in Bishop [6]). If the sheaf has a trivial global section, all features converge to zero in the diffusion limit. The result follows from applying the trigonometric identity cos( π/ 2 + x) = sin x . Idea: We can use rotation matrices to align the harmonic features of the classes with the axis of coordinates as in Figure 6a. It can be checked that these matrices respect the properties outlined above.

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Explicit neural network classifiers for non-separable data

Ewald, Patrícia Muñoz

arXiv.org Machine LearningApr-25-2025

We fully characterize a large class of feedforward neural networks in terms of truncation maps. As an application, we show how a ReLU neural network can implement a feature map which separates concentric data.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

2504.1871

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)

Add feedback

A statistical theory of overfitting for imbalanced classification

Lyu, Jingyang, Zhou, Kangjie, Zhong, Yiqiao

arXiv.org Machine LearningFeb-16-2025

Classification with imbalanced data is a common challenge in data analysis, where certain classes (minority classes) account for a small fraction of the training data compared with other classes (majority classes). Classical statistical theory based on large-sample asymptotics and finite-sample corrections is often ineffective for high-dimensional data, leaving many overfitting phenomena in empirical machine learning unexplained. In this paper, we develop a statistical theory for high-dimensional imbalanced classification by investigating support vector machines and logistic regression. We find that dimensionality induces truncation or skewing effects on the logit distribution, which we characterize via a variational problem under high-dimensional asymptotics. In particular, for linearly separable data generated from a two-component Gaussian mixture model, the logits from each class follow a normal distribution $\mathsf{N}(0,1)$ on the testing set, but asymptotically follow a rectified normal distribution $\max\{\kappa, \mathsf{N}(0,1)\}$ on the training set -- which is a pervasive phenomenon we verified on tabular data, image data, and text data. This phenomenon explains why the minority class is more severely affected by overfitting. Further, we show that margin rebalancing, which incorporates class sizes into the loss function, is crucial for mitigating the accuracy drop for the minority class. Our theory also provides insights into the effects of overfitting on calibration and other uncertain quantification measures.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

2502.11323

Country:

North America > Canada > Ontario (0.27)
North America > United States > Wisconsin (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
(2 more...)

Add feedback

Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data

Xu, Alec S., Yaras, Can, Wang, Peng, Qu, Qing

arXiv.org Machine LearningJan-4-2025

Deep neural networks have attained remarkable success across diverse classification tasks. Recent empirical studies have shown that deep networks learn features that are linearly separable across classes. However, these findings often lack rigorous justifications, even under relatively simple settings. In this work, we address this gap by examining the linear separation capabilities of shallow nonlinear networks. Specifically, inspired by the low intrinsic dimensionality of image data, we model inputs as a union of low-dimensional subspaces (UoS) and demonstrate that a single nonlinear layer can transform such data into linearly separable sets. Theoretically, we show that this transformation occurs with high probability when using random weights and quadratic activations. Notably, we prove this can be achieved when the network width scales polynomially with the intrinsic dimension of the data rather than the ambient dimension. Experimental results corroborate these theoretical findings and demonstrate that similar linear separation properties hold in practical scenarios beyond our analytical scope. This work bridges the gap between empirical observations and theoretical understanding of the separation capacity of nonlinear networks, offering deeper insights into model interpretability and generalization.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2501.02364

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Simplicity Bias in 1-Hidden Layer Neural Networks

Neural Information Processing SystemsOct-10-2024, 03:20:20 GMT

Recent works have demonstrated that neural networks exhibit extreme *simplicity bias* (SB). That is, they learn *only the simplest* features to solve a task at hand, even in the presence of other, more robust but more complex features. Due to the lack of a general and rigorous definition of *features*, these works showcase SB on *semi-synthetic* datasets such as Color-MNIST, MNIST-CIFAR where defining features is relatively easier. In this work, we rigorously define as well as thoroughly establish SB for *one hidden layer* neural networks in the infinite width regime. More concretely, (i) we define SB as the network essentially being a function of a low dimensional projection of the inputs (ii) theoretically, we show that when the data is linearly separable, the network primarily depends on only the linearly separable ( 1 -dimensional) subspace even in the presence of an arbitrarily large number of other, more complex features which could have led to a significantly more robust classifier, (iii) empirically, we show that models trained on *real* datasets such as Imagenet and Waterbirds-Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets, iv) finally, we present a natural ensemble approach that encourages diversity in models by training successive models on features not used by earlier models, and demonstrate that it yields models that are significantly more robust to Gaussian noise.

1-hidden layer neural network, low dimensional projection, simplicity bias, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback